The American National Election Studies (ANES) release survey data every four years that ask participants a lengthy battery of questions. It is well known as being one of the most scientifically rigorous political surveys administered. Let’s load the data.
<- read.csv("C:/Users/Chris/Desktop/RWorking/anes2016.csv") a
First, I’m going to change the variable names and set them as continuous and categorical.
$clinton <- as.numeric(a1$V162078)
a1$limbaugh <- as.factor(a1$V161428)
a1$age <- as.numeric(a1$V161267) a1
Perhaps one of the more well-known metrics, the ANES survey measures participant attitudes towards presidential candidates. Known as the “Feeling Thermometer”, the scale goes from 0 to 100.
As you can see, I’m using the feeling thermometer for Hillary Clinton in the 2016 US Presidential Election as our dependent variable, listening to Rush Limbaugh as an independent categorical variable, and the age of the respondent as an independent continuous variable.
Quickly, I’m going to mean center the age variable.
$age_c <- a1$age - mean (a1$age) a1
boxplot(a1$age_c)
boxplot(a1$limbaugh)
boxplot(a1$clinton)
<- lm(clinton ~ limbaugh*age_c, data=a1)
a2 layout(matrix(1, 2, 2))
residualPlots(a2)
## Test stat Pr(>|Test stat|)
## limbaugh
## age_c 0.7195 0.4719
## Tukey test 0.1045 0.9167
My Limbaugh variable is dichotomous… that is, people either listened to Limbaugh or they didn’t listen to him. As you can see, in relation to my Clinton variable, my Limbaugh listeners are quite strangely distributed. (The non-Limbaugh listeners appear to be more normal, no pun intended.)
We will have to make some attempt to address that.
<- stdres(a2)
a2res layout(matrix(1, 2, 2))
hist(a2res, freq = FALSE)
curve(dnorm, add = TRUE)
<-mean(a2res)
mean.a2resprint(mean.a2res)
## [1] 6.835472e-06
Ideally we would like our residuals to resemble a normal distribution a bit more than they do, but my mean of residuals is very close to zero, and I don’t have any residuals +/- 3 SDs.
layout(matrix(1, 2, 2))
qqPlot(a2, id.n=8)
## 2285 3319
## 1062 1521
par(mfcol=c(2,2))
plot(a2)
The qq-plot is actually not as bad as I would have expected, and my leverage plot isn’t too terrible, either.
layout(matrix(1, 2, 2))
influenceIndexPlot((a2), id.n=8)
I have some outliers, but I’m not overly concerned about outlier influence as my sample size is quite large.
<- influence.measures(a2)
inflm.a2 which(apply(inflm.a2$is.inf, 1, any))
## 6 8 25 88 159 177 194 207 225 264 282 286 312 327 329 434
## 4 6 18 53 84 94 102 109 120 131 142 145 158 166 167 218
## 435 478 501 515 518 531 532 548 555 557 611 629 633 644 693 698
## 219 239 251 258 260 268 269 278 282 284 309 316 318 324 344 346
## 720 741 755 759 763 770 829 841 859 866 889 906 924 955 962 963
## 357 367 378 380 383 384 413 419 426 431 441 452 458 474 479 480
## 988 989 996 1005 1021 1025 1040 1048 1110 1116 1133 1144 1156 1169 1193 1229
## 488 489 492 495 508 511 519 524 549 554 560 565 573 576 591 607
## 1265 1271 1275 1295 1298 1371 1372 1389 1394 1413 1457 1478 1494 1527 1545 1556
## 621 623 626 632 634 660 661 670 671 678 697 706 714 727 735 740
## 1560 1562 1586 1614 1644 1645 1649 1669 1678 1705 1711 1731 1744 1758 1759 1828
## 741 742 752 763 778 779 781 789 793 807 810 823 831 840 841 869
## 1936 1963 1998 2011 2031 2078 2110 2120 2123 2162 2180 2187 2193 2199 2254 2261
## 908 920 933 939 949 972 986 990 992 1008 1019 1024 1027 1029 1048 1052
## 2285 2289 2304 2314 2325 2337 2364 2410 2430 2438 2456 2481 2512 2535 2546 2592
## 1062 1064 1071 1078 1084 1092 1104 1126 1135 1137 1142 1149 1162 1171 1180 1202
## 2596 2603 2610 2661 2662 2674 2716 2758 2761 2778 2789 2838 2847 2860 2872 2889
## 1204 1210 1214 1238 1239 1245 1259 1275 1278 1285 1289 1304 1307 1314 1319 1326
## 2920 2966 2994 3004 3008 3010 3013 3020 3052 3056 3078 3157 3176 3183 3195 3218
## 1337 1352 1366 1371 1373 1375 1377 1379 1396 1398 1405 1444 1455 1457 1464 1474
## 3220 3270 3294 3312 3319 3329 3338 3346 3360 3376 3406 3422 3424 3443 3482 3491
## 1476 1499 1508 1517 1521 1523 1526 1529 1537 1544 1555 1561 1562 1572 1590 1594
## 3503 3527 3589 3591 3618 3629 3663 3665 3710 3727 3760 3800 3828 3854 3873 3892
## 1601 1612 1637 1638 1648 1652 1667 1668 1686 1694 1713 1727 1742 1753 1764 1774
## 3914 3918 3943 3954 3968 4024 4040 4064 4078 4080 4088 4121 4167 4201 4202 4225
## 1782 1783 1795 1802 1809 1827 1835 1843 1849 1851 1854 1867 1884 1893 1894 1903
## 4237 4238
## 1909 1910
#summary(inflm.a2)
R found 185 influential measures. That’s not quite 10% of our sample. I’m going to try some transformations on the Clinton variable to see if I can clean up the residuals a little bit.
par(mfcol=c(2,2))
boxplot(a1$clinton)
boxplot(log10(.0001+a1$clinton))
boxplot(sqrt(a1$clinton))
boxplot(1/(.0001+a1$clinton))
It looks like the square root transformation (top right) is pretty good. Let’s test it in comparison to our Limbaugh variable which seems to be having the most trouble.
par(mfcol=c(1,2))
boxplot(a1$clinton~a1$limbaugh)
boxplot(sqrt(a1$clinton)~a1$limbaugh)
The square root transformation does appear to be better. It’s not perfect, but data rarely is! Let’s check assumptions again.
$clinroot <- sqrt(a1$clinton) a1
<- lm(clinroot ~ limbaugh*age_c, data=a1)
a3 layout(matrix(1, 2, 2))
residualPlots(a3)
## Test stat Pr(>|Test stat|)
## limbaugh
## age_c 0.3640 0.7159
## Tukey test 0.1554 0.8765
This helped my Limbaugh listeners a lot. There appear to be some floor effects in my overall regression model.
<- stdres(a3)
a3res layout(matrix(1, 2, 2))
hist(a3res, freq = FALSE)
curve(dnorm, add = TRUE)
<-mean(a3res)
mean.a3resprint(mean.a3res)
## [1] 5.161885e-06
The distribution of residuals is slightly better, but still not great. Mean of residuals is still basically zero.
layout(matrix(1, 2, 1))
qqPlot(a3, id.n=8)
## 2285 2481
## 1062 1149
par(mfcol=c(2,2))
plot(a3)
The qq-plot is pretty messy.
layout(matrix(1, 2, 2))
influenceIndexPlot((a3))
There are still some outliers, obviously. Again, I bet our sample size is resilient enough to see some effects even with negative outlier influence.
<- influence.measures(a3)
inflm.a3 which(apply(inflm.a3$is.inf, 1, any))
## 8 88 159 177 194 225 264 282 286 312 327 329 434 435 478 501
## 6 53 84 94 102 120 131 142 145 158 166 167 218 219 239 251
## 515 531 532 548 555 557 611 629 633 693 698 720 741 755 759 763
## 258 268 269 278 282 284 309 316 318 344 346 357 367 378 380 383
## 770 841 889 906 924 955 962 963 988 989 996 1005 1021 1025 1040 1048
## 384 419 441 452 458 474 479 480 488 489 492 495 508 511 519 524
## 1110 1116 1133 1144 1156 1193 1229 1265 1271 1275 1295 1298 1371 1372 1413 1457
## 549 554 560 565 573 591 607 621 623 626 632 634 660 661 678 697
## 1478 1494 1527 1545 1556 1560 1562 1586 1644 1645 1649 1669 1678 1705 1711 1731
## 706 714 727 735 740 741 742 752 778 779 781 789 793 807 810 823
## 1744 1758 1759 1828 1936 1998 2031 2110 2120 2123 2180 2187 2193 2199 2254 2285
## 831 840 841 869 908 933 949 986 990 992 1019 1024 1027 1029 1048 1062
## 2289 2304 2314 2325 2337 2364 2410 2430 2438 2456 2481 2512 2535 2546 2592 2603
## 1064 1071 1078 1084 1092 1104 1126 1135 1137 1142 1149 1162 1171 1180 1202 1210
## 2610 2661 2662 2674 2761 2778 2789 2838 2847 2860 2872 2889 2920 2966 3008 3010
## 1214 1238 1239 1245 1278 1285 1289 1304 1307 1314 1319 1326 1337 1352 1373 1375
## 3013 3020 3052 3056 3078 3157 3176 3195 3196 3270 3294 3319 3329 3346 3360 3406
## 1377 1379 1396 1398 1405 1444 1455 1464 1465 1499 1508 1521 1523 1529 1537 1555
## 3422 3482 3503 3527 3591 3618 3629 3663 3665 3710 3727 3760 3800 3828 3843 3854
## 1561 1590 1601 1612 1638 1648 1652 1667 1668 1686 1694 1713 1727 1742 1749 1753
## 3873 3892 3918 3943 3954 3968 4040 4064 4078 4080 4121 4167 4201 4202 4225 4237
## 1764 1774 1783 1795 1802 1809 1835 1843 1849 1851 1867 1884 1893 1894 1903 1909
## 4238
## 1910
#summary(inflm.a3)
R found a fair amount less influential measures. That having been said, our sample size is large enough that it can take some outliers.
I think I’m going to stick with the square root transformation. The distribution of residuals is not so skewed, although admittedly it still doesn’t represent normality.
Anova(a3, contrasts=list(limbaugh=contr.sum, age_c=contr.sum), type=3)
## Anova Table (Type III tests)
##
## Response: clinroot
## Sum Sq Df F value Pr(>F)
## (Intercept) 59470 1 5508.1811 < 2e-16 ***
## limbaugh 1523 1 141.0528 < 2e-16 ***
## age_c 5 1 0.4557 0.49973
## limbaugh:age_c 43 1 3.9423 0.04723 *
## Residuals 20708 1918
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(a3)
##
## Call:
## lm(formula = clinroot ~ limbaugh * age_c, data = a1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.017 -2.101 0.936 2.509 7.770
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.920083 0.079767 74.217 <2e-16 ***
## limbaugh1 -3.070863 0.258565 -11.877 <2e-16 ***
## age_c -0.003270 0.004844 -0.675 0.4997
## limbaugh1:age_c -0.028908 0.014560 -1.986 0.0472 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.286 on 1918 degrees of freedom
## Multiple R-squared: 0.09626, Adjusted R-squared: 0.09485
## F-statistic: 68.1 on 3 and 1918 DF, p-value: < 2.2e-16
Limbaugh listening is significant, and the interaction between limbaugh listening and age is also significant. The model accounts for 10% of the variance, and the overall model is significant.
<- interactionMeans(a3)) (a3.means
## limbaugh adjusted mean std. error
## 1 0 5.920083 0.07976708
## 2 1 2.849220 0.24595328
Here are the means for the model at the two Limbaugh conditions.
<- lmres(clinroot ~ limbaugh*age_c, data=a1)
a3res <- simpleSlope(a3res, pred="age_c", mod1="limbaugh1")
simSlopes summary(simSlopes)
##
## ** Estimated points of clinroot **
##
## Low age_c (-1 SD) High age_c (+1 SD)
## Low limbaugh1 (-1 SD) 6.5025 6.5898
## High limbaugh1 (+1 SD) 4.8563 4.3320
##
##
##
## ** Simple Slopes analysis ( df= 1918 ) **
##
## simple slope standard error t-value p.value
## Low limbaugh1 (-1 SD) 0.0026 0.0065 0.41 0.685
## High limbaugh1 (+1 SD) -0.0158 0.0065 -2.41 0.016 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
##
##
## ** Bauer & Curran 95% CI **
##
## lower CI upper CI
## limbaugh1 -17.514 0.2125
This simple slopes analysis highlights my interaction. For those individuals who aren’t Limbaugh listeners, age has no effect on Clinton-liking. However, for those individuals who listen to the ’baugh, scores on the Clinton thermometer decrease as age increases. Here’s a plot that visualizes the results!
<- ggplot(a1, aes(age_c, clinroot,color=limbaugh))+stat_smooth(method=lm) +
graph ggtitle("Age and Limbaugh Listening as Factors \n in Hillary Clinton Feeling Thermometer") +
scale_x_continuous(name = "Age (Centered)") +
scale_y_continuous(name = "Clinton Feeling Thermometer") +
theme(axis.line.x = element_line(linewidth=.5, colour = "black"),
axis.line.y = element_line(linewidth=.5, colour = "black"),
axis.text.x=element_text(colour="black", size = 9),
axis.text.y=element_text(colour="black", size = 9),
panel.grid.major = element_line(colour = "#d3d3d3"),
panel.grid.minor = element_blank(),
panel.border = element_blank(), panel.background = element_blank(),
plot.title = element_text(size = 14, family = "serif", face = "bold", hjust = .5),
text=element_text(family="serif"))
<- graph + scale_colour_discrete(name="Limbaugh\nListener",
graph breaks=c("0", "1"),
labels=c("No", "Yes"))
graph
As you can see, the plot suggests that Age is a significant factor in Clinton-liking for Limbaugh listeners but not for non-Limbaugh listeners. Pretty cool!
I’m going to use the same model as before, but this time I’m going to account for sex as a covariate. Introducing the new variable in the dataset.
$clinton <- as.numeric(b1$V162078)
b1$limbaugh <- as.factor(b1$V161428)
b1$age <- as.numeric(b1$V161267)
b1$sex <- as.factor(b1$V161342) b1
Variables are renamed and set as categorical or continuous.
Centering age.
$age_c <- b1$age - mean (b1$age) b1
I’m going to go through the same steps to check residuals.
<- lm(clinton ~ limbaugh*age_c*sex, data=b1)
ba layout(matrix(1, 2, 2))
residualPlots(ba)
## Test stat Pr(>|Test stat|)
## limbaugh
## age_c 0.6657 0.5057
## sex
## Tukey test 1.4449 0.1485
My Limbaugh listeners are distributed oddly, as expected.
<- stdres(ba)
bares layout(matrix(1, 2, 2))
hist(bares, freq = FALSE)
curve(dnorm, add = TRUE)
<-mean(bares)
mean.baresprint(mean.bares)
## [1] 3.24306e-05
This distribution of residuals is off, the mean is approaching zero.
layout(matrix(1, 2, 2))
qqPlot(ba, id.n=8)
## 2285 2481
## 1056 1142
par(mfcol=c(2,2))
plot(ba)
The qq-plot looks a little rough, though I’ve seen worse.
layout(matrix(1, 2, 2))
influenceIndexPlot((ba), id.n=8)
There are definitely some outliers here. If my sample size were smaller, I might consider removing some of these in an outlier sweep. As it is, I don’t think my overall findings will be too affected by these points.
<- influence.measures(ba)
inflm.ba which(apply(inflm.ba$is.inf, 1, any))
## 8 88 112 159 177 194 207 264 286 312 327 434 435 478 501 515
## 6 53 64 83 93 101 108 130 144 157 164 216 217 237 249 256
## 531 532 548 555 557 611 629 644 693 720 741 759 763 770 829 841
## 265 266 275 279 281 305 312 319 339 352 362 375 378 379 408 414
## 859 889 906 924 955 962 963 988 989 996 1005 1021 1025 1040 1110 1116
## 421 436 447 453 469 474 475 483 484 487 490 503 506 514 544 549
## 1133 1144 1156 1193 1229 1265 1271 1275 1295 1371 1372 1389 1478 1494 1510 1527
## 555 560 568 586 602 616 618 621 627 654 655 664 700 708 713 721
## 1545 1556 1560 1562 1586 1644 1649 1669 1678 1705 1711 1731 1744 1758 1759 1828
## 729 734 735 736 746 772 775 783 787 801 804 817 825 834 835 863
## 1936 1998 2031 2110 2120 2123 2187 2193 2199 2254 2285 2289 2304 2314 2325 2364
## 902 927 943 980 984 986 1018 1021 1023 1042 1056 1058 1065 1071 1077 1097
## 2410 2430 2456 2481 2512 2535 2546 2592 2603 2610 2662 2674 2716 2758 2761 2778
## 1119 1128 1135 1142 1155 1164 1173 1195 1203 1207 1232 1238 1252 1268 1271 1278
## 2838 2847 2860 2872 2920 2966 2994 3004 3008 3013 3020 3052 3056 3078 3157 3176
## 1297 1300 1307 1312 1330 1345 1359 1364 1366 1370 1372 1389 1391 1398 1437 1448
## 3195 3196 3218 3270 3294 3319 3329 3338 3346 3360 3406 3422 3443 3482 3503 3589
## 1457 1458 1467 1491 1500 1513 1515 1518 1521 1529 1547 1553 1564 1582 1593 1627
## 3618 3629 3665 3700 3727 3760 3800 3828 3854 3873 3892 3918 3943 3954 3968 4024
## 1638 1642 1657 1670 1683 1702 1716 1731 1742 1753 1763 1772 1784 1791 1798 1816
## 4040 4064 4078 4080 4088 4121 4167 4201 4202 4222 4225 4238
## 1824 1832 1838 1840 1843 1856 1873 1882 1883 1891 1892 1899
#summary(inflm.ba)
R found 189 influential measures. Let’s perform that root transformation on my Clinton variable and see if that helps things on my assumptions.
$clinroot <- sqrt(b1$clinton) b1
<- lm(clinroot ~ limbaugh*age_c*sex, data=b1)
bb layout(matrix(1, 2, 2))
residualPlots(bb)
## Test stat Pr(>|Test stat|)
## limbaugh
## age_c 0.3343 0.7382
## sex
## Tukey test 1.4606 0.1441
Our Limbaugh listeners are looking much healthier!
<- stdres(bb)
bbres layout(matrix(1, 2, 2))
hist(bbres, freq = FALSE)
curve(dnorm, add = TRUE)
<-mean(bbres)
mean.bbresprint(mean.bbres)
## [1] 2.904784e-05
This histogram looks great compared to some of the models I’ve made with this dataset! (It’s still a little rough.)
par(mfcol=c(1,1))
qqPlot(bb, id.n=8)
## 2285 2481
## 1056 1142
par(mfcol=c(2,2))
plot(bb)
The qq-plot is of course not great. I’ve got some skew in my residual vs fitted but I think the fanning is not too bad, suggesting homoscedasticity.
layout(matrix(1, 2, 2))
influenceIndexPlot((bb), id.n=8)
Obviously still have outliers.
<- influence.measures(bb)
inflm.bb which(apply(inflm.bb$is.inf, 1, any))
## 8 88 112 159 177 194 207 286 312 327 434 435 478 501 515 532
## 6 53 64 83 93 101 108 144 157 164 216 217 237 249 256 266
## 555 557 611 629 644 693 720 759 763 770 829 841 859 889 906 924
## 279 281 305 312 319 339 352 375 378 379 408 414 421 436 447 453
## 955 962 963 988 989 996 1005 1021 1025 1040 1110 1116 1133 1144 1156 1271
## 469 474 475 483 484 487 490 503 506 514 544 549 555 560 568 618
## 1275 1295 1371 1372 1478 1494 1510 1527 1545 1556 1560 1562 1586 1644 1669 1678
## 621 627 654 655 700 708 713 721 729 734 735 736 746 772 783 787
## 1705 1711 1731 1744 1758 1759 1828 1936 1998 2031 2110 2120 2123 2187 2193 2199
## 801 804 817 825 834 835 863 902 927 943 980 984 986 1018 1021 1023
## 2254 2285 2289 2304 2314 2325 2364 2410 2430 2456 2481 2535 2546 2592 2603 2610
## 1042 1056 1058 1065 1071 1077 1097 1119 1128 1135 1142 1164 1173 1195 1203 1207
## 2662 2716 2758 2761 2778 2838 2847 2860 2872 2920 2966 2994 3004 3008 3013 3020
## 1232 1252 1268 1271 1278 1297 1300 1307 1312 1330 1345 1359 1364 1366 1370 1372
## 3052 3056 3078 3157 3176 3195 3196 3218 3270 3294 3319 3329 3338 3346 3360 3406
## 1389 1391 1398 1437 1448 1457 1458 1467 1491 1500 1513 1515 1518 1521 1529 1547
## 3422 3443 3482 3503 3589 3618 3629 3665 3700 3727 3760 3800 3828 3854 3873 3892
## 1553 1564 1582 1593 1627 1638 1642 1657 1670 1683 1702 1716 1731 1742 1753 1763
## 3918 3943 3954 3968 4024 4040 4078 4080 4088 4121 4167 4201 4202 4222 4225 4238
## 1772 1784 1791 1798 1816 1824 1838 1840 1843 1856 1873 1882 1883 1891 1892 1899
#summary(inflm.bb)
R found 37 less influential measures.
Sticking with the root transformation. Okay, here’s my original model.
<- lm(clinroot ~ limbaugh*age_c, data=b1)
b2 Anova(b2, type = 3)
## Anova Table (Type III tests)
##
## Response: clinroot
## Sum Sq Df F value Pr(>F)
## (Intercept) 58898 1 5445.4044 < 2e-16 ***
## limbaugh 1513 1 139.8506 < 2e-16 ***
## age_c 6 1 0.5763 0.44786
## limbaugh:age_c 41 1 3.8185 0.05084 .
## Residuals 20626 1907
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(b2)
##
## Call:
## lm(formula = clinroot ~ limbaugh * age_c, data = b1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.0207 -2.1023 0.8645 2.5216 7.7700
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.910817 0.080100 73.793 <2e-16 ***
## limbaugh1 -3.061551 0.258886 -11.826 <2e-16 ***
## age_c -0.003692 0.004864 -0.759 0.4479
## limbaugh1:age_c -0.028486 0.014578 -1.954 0.0508 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.289 on 1907 degrees of freedom
## Multiple R-squared: 0.09615, Adjusted R-squared: 0.09473
## F-statistic: 67.62 on 3 and 1907 DF, p-value: < 2.2e-16
Apparently the removal of some of the participants who didn’t answer sex with a 1 or a 2 has caused my interaction to become only marginally significant. We should check to see if my Clinton thermometer even has a relationship with sex.
<- lm(clinroot ~ sex, data=b1)
b3 Anova(b3, type = 3)
## Anova Table (Type III tests)
##
## Response: clinroot
## Sum Sq Df F value Pr(>F)
## (Intercept) 21881.5 1 1883.420 < 2.2e-16 ***
## sex 641.6 1 55.226 1.612e-13 ***
## Residuals 22178.7 1909
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Well that’s a resounding yes! Sex is highly related to scores on the Clinton thermometer. Let’s include sex in the first linear model equation as a main effect.
<- lm(clinroot ~ sex + limbaugh*age_c, data=b1)
b4 Anova(b4, type = 3)
## Anova Table (Type III tests)
##
## Response: clinroot
## Sum Sq Df F value Pr(>F)
## (Intercept) 23612.5 1 2225.0993 < 2.2e-16 ***
## sex 399.9 1 37.6875 1.008e-09 ***
## limbaugh 1301.0 1 122.6012 < 2.2e-16 ***
## age_c 4.1 1 0.3828 0.53616
## limbaugh:age_c 51.9 1 4.8953 0.02705 *
## Residuals 20226.3 1906
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
When controlling for sex, the interaction has become significant again. Let’s see if sex confounds with age in our sample. I have no reason to think that it would, but better safe than sorry.
<- lm(age_c ~ sex, data=b1)
b5 Anova(b5, type = 3)
## Anova Table (Type III tests)
##
## Response: age_c
## Sum Sq Df F value Pr(>F)
## (Intercept) 284 1 1.0242 0.3116
## sex 538 1 1.9418 0.1636
## Residuals 528651 1909
summary(b5)
##
## Call:
## lm(formula = age_c ~ sex, data = b1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -30.315 -14.252 -0.252 12.748 42.748
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.5605 0.5538 1.012 0.312
## sex2 -1.0625 0.7625 -1.393 0.164
##
## Residual standard error: 16.64 on 1909 degrees of freedom
## Multiple R-squared: 0.001016, Adjusted R-squared: 0.0004928
## F-statistic: 1.942 on 1 and 1909 DF, p-value: 0.1636
It doesn’t appear so. Let’s use a logit to see if Limbaugh listening is related to sex. I am sure that it will be.
<- glm(limbaugh~sex, data=b1, family=binomial(link='logit'))
b6 summary(b6)
##
## Call:
## glm(formula = limbaugh ~ sex, family = binomial(link = "logit"),
## data = b1)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.5827 -0.5827 -0.4013 -0.4013 2.2623
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.68719 0.09168 -18.404 < 2e-16 ***
## sex2 -0.79129 0.14933 -5.299 1.17e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1360.7 on 1910 degrees of freedom
## Residual deviance: 1331.4 on 1909 degrees of freedom
## AIC: 1335.4
##
## Number of Fisher Scoring iterations: 5
Indeed, Limbaugh listening is significantly related to sex. Let’s see if the interaction between sex and Limbaugh listening is significantly related to Clinton feelings.
<- lm(clinroot ~ sex*limbaugh, data=b1)
b7 Anova(b7, type = 3)
## Anova Table (Type III tests)
##
## Response: clinroot
## Sum Sq Df F value Pr(>F)
## (Intercept) 22086.3 1 2075.8637 < 2.2e-16 ***
## sex 390.5 1 36.7067 1.65e-09 ***
## limbaugh 1037.7 1 97.5339 < 2.2e-16 ***
## sex:limbaugh 10.6 1 0.9964 0.3183
## Residuals 20289.7 1907
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(b7)
##
## Call:
## lm(formula = clinroot ~ sex * limbaugh, data = b1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.3494 -2.4305 0.9408 2.8702 7.5695
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.3837 0.1182 45.562 < 2e-16 ***
## sex2 0.9656 0.1594 6.059 1.65e-09 ***
## limbaugh1 -2.9532 0.2990 -9.876 < 2e-16 ***
## sex2:limbaugh1 -0.4862 0.4871 -0.998 0.318
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.262 on 1907 degrees of freedom
## Multiple R-squared: 0.1109, Adjusted R-squared: 0.1095
## F-statistic: 79.28 on 3 and 1907 DF, p-value: < 2.2e-16
The interaction between sex and Limbaugh listening is non-significant. Let’s test both interactions and see what we find.
<- lm(clinroot ~ sex*limbaugh + limbaugh*age_c, data=b1)
b8 Anova(b8, type = 3)
## Anova Table (Type III tests)
##
## Response: clinroot
## Sum Sq Df F value Pr(>F)
## (Intercept) 22062.7 1 2078.5823 < 2.2e-16 ***
## sex 388.3 1 36.5819 1.758e-09 ***
## limbaugh 815.5 1 76.8312 < 2.2e-16 ***
## age_c 4.0 1 0.3751 0.54033
## sex:limbaugh 6.0 1 0.5650 0.45236
## limbaugh:age_c 48.5 1 4.5704 0.03266 *
## Residuals 20220.3 1905
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(b8)
##
## Call:
## lm(formula = clinroot ~ sex * limbaugh + limbaugh * age_c, data = b1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.4272 -2.3830 0.8959 2.8233 8.0029
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.382156 0.118052 45.591 < 2e-16 ***
## sex2 0.963164 0.159246 6.048 1.76e-09 ***
## limbaugh1 -2.731291 0.311601 -8.765 < 2e-16 ***
## age_c -0.002952 0.004820 -0.612 0.5403
## sex2:limbaugh1 -0.367395 0.488791 -0.752 0.4524
## limbaugh1:age_c -0.031016 0.014508 -2.138 0.0327 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.258 on 1905 degrees of freedom
## Multiple R-squared: 0.1139, Adjusted R-squared: 0.1116
## F-statistic: 48.99 on 5 and 1905 DF, p-value: < 2.2e-16
YES! Our effect is still significant. Let’s check out the three way interaction just to make sure that there’s no higher order interaction.
<- lm(clinroot ~ sex*limbaugh*age_c, data=b1)
b9 Anova(b9, type = 3)
## Anova Table (Type III tests)
##
## Response: clinroot
## Sum Sq Df F value Pr(>F)
## (Intercept) 22070.4 1 2081.2371 < 2.2e-16 ***
## sex 380.9 1 35.9143 2.46e-09 ***
## limbaugh 848.3 1 79.9976 < 2.2e-16 ***
## age_c 0.6 1 0.0564 0.8122
## sex:limbaugh 0.1 1 0.0135 0.9076
## sex:age_c 8.2 1 0.7721 0.3797
## limbaugh:age_c 9.2 1 0.8687 0.3514
## sex:limbaugh:age_c 19.4 1 1.8323 0.1760
## Residuals 20180.3 1903
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(b9)
##
## Call:
## lm(formula = clinroot ~ sex * limbaugh * age_c, data = b1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.5288 -2.3369 0.9276 2.7655 7.7701
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.384651 0.118031 45.621 < 2e-16 ***
## sex2 0.955372 0.159419 5.993 2.46e-09 ***
## limbaugh1 -2.852153 0.318885 -8.944 < 2e-16 ***
## age_c 0.001701 0.007158 0.238 0.812
## sex2:limbaugh1 0.063388 0.546323 0.116 0.908
## sex2:age_c -0.008504 0.009678 -0.879 0.380
## limbaugh1:age_c -0.017422 0.018692 -0.932 0.351
## sex2:limbaugh1:age_c -0.040474 0.029900 -1.354 0.176
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.256 on 1903 degrees of freedom
## Multiple R-squared: 0.1157, Adjusted R-squared: 0.1124
## F-statistic: 35.56 on 7 and 1903 DF, p-value: < 2.2e-16
So I think we’re in good shape here. The three way interaction is non-significant, suggesting that our previous model’s (b8) interaction is sound. Let’s examine the effects further.
<- c("Males", "Females")
sex.labs names(sex.labs) <- c(1,2)
<- ggplot(b8, aes(age_c, clinroot,color=limbaugh))+stat_smooth(method=lm) +
graph ggtitle("Age and Limbaugh Listening as Factors \n in Hillary Clinton Feeling Thermometer") +
scale_x_continuous(name = "Age (Centered)") +
scale_y_continuous(name = "Clinton Feeling Thermometer") +
theme(axis.line.x = element_line(linewidth=.5, colour = "black"),
axis.line.y = element_line(linewidth=.5, colour = "black"),
axis.text.x=element_text(colour="black", size = 9),
axis.text.y=element_text(colour="black", size = 9),
panel.grid.major = element_line(colour = "#d3d3d3"),
panel.grid.minor = element_blank(),
panel.border = element_blank(), panel.background = element_blank(),
plot.title = element_text(size = 14, family = "serif", face = "bold", hjust = .5),
text=element_text(family="serif"))
<- graph + scale_colour_discrete(name="Limbaugh\nListener",
graph breaks=c("0", "1"),
labels=c("No", "Yes"))
<- graph + facet_grid(. ~ sex, labeller = labeller(sex = sex.labs))
graph
graph
## `geom_smooth()` using formula = 'y ~ x'
This graph suggests that Limbaugh listeners who are male decrease in their liking for Hillary Clinton as their age increases, but only slightly. However, females who are Limbaugh listeners decrease in their liking for Hillary Clinton pretty dramatically as their age increases. Very interesting!